Assignment_4: Swimming Scholarship Enrollment Clustering¶
Data Dictionary¶
Coding:¶
Part 1: K-Means Clustering (Coding) - 35%¶
1. Data Exploration and Preprocessing¶
o Load and inspect the dataset.¶
o Perform Exploratory Data Analysis (EDA) to understand the distribution and patterns.¶
o Separate categorical and numerical features.¶
o Apply One-Hot Encoding to categorical variables.¶
o Standardize/Normalize numerical features for clustering.¶
In [1]:
import pandas as pd
import seaborn as sns
import warnings
# Ignore warnings for cleaner output
warnings.filterwarnings("ignore")
# Step 1: Import Required Libraries
# Importing the necessary Python libraries for data manipulation, visualization, and clustering
import pandas as pd # For data manipulation and analysis
import numpy as np # For numerical operations
import matplotlib.pyplot as plt # For plotting graphs
import seaborn as sns # For statistical data visualization
from sklearn.cluster import KMeans # For K-Means clustering
from sklearn.preprocessing import StandardScaler # For standardizing features
from sklearn.metrics import silhouette_score, silhouette_samples # For evaluating clustering performance
# Mount Google Drive (specific to Google Colab) to access data files stored on Google Drive
from google.colab import drive
drive.mount('/content/drive')
# Import necessary libraries
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans, DBSCAN
from sklearn.metrics import silhouette_score
from sklearn.cluster import AgglomerativeClustering
import scipy.cluster.hierarchy as sch
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.spatial import Voronoi, voronoi_plot_2d
Mounted at /content/drive
In [2]:
import os
# Define the file path
file_path = '/content/drive/My Drive/Assignment4/swimming_scholarship_dataset_expanded.csv' # Replace with your file's path
# Check if the file exists
if os.path.exists(file_path):
print("File exists!")
else:
print("File does not exist.")
File exists!
In [3]:
df = pd.read_csv(file_path)
In [4]:
df.sample(3)
Out[4]:
| Application ID | Gender | Age | State | High School GPA | Swimming Time 100m (sec) | Swimming Time 200m (sec) | Swimming Time 400m (sec) | Swimming Type | Distance Specialization | Swim Club Membership | Years Competitive Swimming | Height (cm) | Weight (kg) | Academic Interest | Parent Support Level | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 299 | APP0300 | Female | 14 | NC | 2.10 | 50.49 | 121.62 | 222.14 | Butterfly | 400m | Yes | 4 | 174 | 80 | Biology | High |
| 459 | APP0460 | Male | 16 | OH | 2.91 | 54.21 | 132.81 | 284.41 | Freestyle | 400m | No | 1 | 195 | 84 | Economics | Medium |
| 138 | APP0139 | Female | 15 | PA | 3.91 | 60.79 | 135.45 | 370.12 | Breaststroke | 100m | Yes | 2 | 186 | 98 | Mathematics | High |
In [5]:
# Check the basic structure of the dataset
print(df.info())
print(df.describe())
# Display the first few rows to understand the data
df.sample(3)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Application ID 500 non-null object
1 Gender 500 non-null object
2 Age 500 non-null int64
3 State 500 non-null object
4 High School GPA 500 non-null float64
5 Swimming Time 100m (sec) 500 non-null float64
6 Swimming Time 200m (sec) 500 non-null float64
7 Swimming Time 400m (sec) 500 non-null float64
8 Swimming Type 500 non-null object
9 Distance Specialization 500 non-null object
10 Swim Club Membership 500 non-null object
11 Years Competitive Swimming 500 non-null int64
12 Height (cm) 500 non-null int64
13 Weight (kg) 500 non-null int64
14 Academic Interest 500 non-null object
15 Parent Support Level 500 non-null object
dtypes: float64(4), int64(4), object(8)
memory usage: 62.6+ KB
None
Age High School GPA Swimming Time 100m (sec) \
count 500.000000 500.000000 500.000000
mean 15.970000 2.978520 60.173380
std 1.425891 0.587528 5.883978
min 14.000000 2.010000 50.130000
25% 15.000000 2.447500 54.925000
50% 16.000000 2.965000 60.390000
75% 17.000000 3.492500 65.347500
max 18.000000 4.000000 69.960000
Swimming Time 200m (sec) Swimming Time 400m (sec) \
count 500.000000 500.000000
mean 125.153360 299.176960
std 14.514755 56.941798
min 100.250000 200.310000
25% 112.012500 251.987500
50% 125.935000 300.345000
75% 137.040000 346.507500
max 149.970000 399.670000
Years Competitive Swimming Height (cm) Weight (kg)
count 500.000000 500.000000 500.000000
mean 5.376000 174.844000 72.856000
std 2.844434 14.903822 16.470027
min 1.000000 150.000000 45.000000
25% 3.000000 162.750000 59.000000
50% 5.000000 175.000000 73.000000
75% 8.000000 188.000000 87.000000
max 10.000000 200.000000 100.000000
Out[5]:
| Application ID | Gender | Age | State | High School GPA | Swimming Time 100m (sec) | Swimming Time 200m (sec) | Swimming Time 400m (sec) | Swimming Type | Distance Specialization | Swim Club Membership | Years Competitive Swimming | Height (cm) | Weight (kg) | Academic Interest | Parent Support Level | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 349 | APP0350 | Male | 18 | GA | 3.88 | 59.63 | 110.76 | 381.93 | Freestyle | 100m | Yes | 10 | 156 | 96 | Physics | High |
| 198 | APP0199 | Female | 18 | IL | 2.34 | 51.16 | 109.59 | 383.02 | Butterfly | 200m | No | 8 | 171 | 59 | Physics | High |
| 90 | APP0091 | Male | 16 | NC | 2.55 | 55.44 | 111.36 | 322.05 | Butterfly | 400m | No | 6 | 154 | 65 | Economics | High |
In [6]:
# Check for missing values in each column
print(df.isnull().sum())
Application ID 0 Gender 0 Age 0 State 0 High School GPA 0 Swimming Time 100m (sec) 0 Swimming Time 200m (sec) 0 Swimming Time 400m (sec) 0 Swimming Type 0 Distance Specialization 0 Swim Club Membership 0 Years Competitive Swimming 0 Height (cm) 0 Weight (kg) 0 Academic Interest 0 Parent Support Level 0 dtype: int64
In [7]:
df = df.drop(columns=['Application ID', 'Academic Interest', 'Age', 'Height (cm)', 'Weight (kg)', 'Parent Support Level'])
In [8]:
numerical_columns = df.select_dtypes(include=['number'])
In [9]:
categorical_columns = df.select_dtypes(include=['object', 'category'])
In [10]:
print("Numerical columns:", numerical_columns.columns)
print("Categorical columns:", categorical_columns.columns)
Numerical columns: Index(['High School GPA', 'Swimming Time 100m (sec)',
'Swimming Time 200m (sec)', 'Swimming Time 400m (sec)',
'Years Competitive Swimming'],
dtype='object')
Categorical columns: Index(['Gender', 'State', 'Swimming Type', 'Distance Specialization',
'Swim Club Membership'],
dtype='object')
In [ ]:
# Plot count plots for each categorical column with a custom palette and annotations
for column in categorical_columns:
plt.figure(figsize=(10, 6)) # Increased figure size
ax = sns.countplot(data=df, x=column, palette='viridis') # Changed to viridis palette
# Add count annotations on top of the bars
for p in ax.patches:
ax.annotate(f'{int(p.get_height())}', # Convert to integer
(p.get_x() + p.get_width() / 2., p.get_height()),
ha='center', va='bottom', # Changed va to bottom
xytext=(0, 5), # Increased text offset
fontsize=10, # Added fontsize
textcoords='offset points')
plt.title(f'Distribution of {column}', fontsize=12, pad=15) # Better title
plt.xticks(rotation=30) # Changed rotation angle
plt.tight_layout() # Added tight_layout
plt.show()
In [ ]:
import seaborn as sns
import matplotlib.pyplot as plt
# Plot histograms for each numerical column
numerical_columns = df.select_dtypes(include=['number'])
# Plot histograms for each numerical column with a specific color
for column in numerical_columns:
plt.figure(figsize=(8, 4))
sns.histplot(df[column], kde=True, color='skyblue') # Change color to 'skyblue'
plt.title(f'Distribution of {column}')
plt.show()
In [ ]:
import seaborn as sns
import matplotlib.pyplot as plt
# Define numerical features
numerical_features = [ 'High School GPA', 'Swimming Time 100m (sec)',
'Swimming Time 200m (sec)', 'Swimming Time 400m (sec)',
'Years Competitive Swimming']
# Create figure and grid of subplots with more height
fig, axes = plt.subplots(2, 4, figsize=(20, 10))
axes = axes.flatten() # Flatten to 1D array for easier indexing
# Define custom colors with better contrast
colors = ['#2E86C1', '#E74C3C', '#27AE60', '#8E44AD',
'#F39C12']
# Loop through numerical features and plot each in a subplot
for i, feature in enumerate(numerical_features):
sns.boxplot(y=df[feature], ax=axes[i], color=colors[i])
axes[i].set_title(f'Boxplot of {feature}', fontsize=12, pad=10)
axes[i].set_ylabel(feature, fontsize=10)
# Rotate x-tick labels if needed
axes[i].tick_params(axis='both', labelsize=9)
plt.tight_layout(pad=3.0)
plt.show()
In [ ]:
# Correlation heatmap to study relationships between numerical features
correlation_matrix = df[numerical_features].corr()
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm")
plt.title("Correlation Matrix")
plt.show()
In [ ]:
# Pairplot to visualize the relationships between features
sns.pairplot(df, vars=numerical_features, hue="Gender")
plt.show()
In [ ]:
# Scaling the data
scaler = MinMaxScaler()
df_scaled = scaler.fit_transform(df[['High School GPA', 'Swimming Time 100m (sec)',
'Swimming Time 200m (sec)', 'Swimming Time 400m (sec)',
'Years Competitive Swimming']]) # CustomerID not included
In [ ]:
# Get correct column names from available columns
categorical_columns = ['Gender','State', 'Swimming Type',
'Distance Specialization', 'Swim Club Membership']
# Update get_dummies with exact column names
df = pd.get_dummies(df, columns=categorical_columns, drop_first=True)
In [ ]:
wcss = []
K_range = range(1, 11) # Testing k values from 1 to 10
for k in K_range:
kmeans = KMeans(n_clusters=k, random_state=42)
kmeans.fit(df_scaled)
wcss.append(kmeans.inertia_)
In [ ]:
# Plotting the Elbow Curve
plt.figure(figsize=(10, 6))
plt.plot(K_range, wcss, marker='o', linestyle='--')
plt.xlabel('Number of Clusters (K)')
plt.ylabel('Within-Cluster Sum of Squares (WCSS)')
plt.title('Elbow Method for Optimal K')
plt.grid(True)
plt.show()
In [ ]:
# Fit the final K-Means model with the optimal K (e.g., K=4)
kmeans = KMeans(n_clusters=6, random_state=42) # Initialize KMeans with 4 clusters
kmeans_labels = kmeans.fit_predict(df_scaled) # Fit KMeans and get cluster labels
df['KM'] = kmeans_labels # Add KMeans cluster labels as 'KM' # Add the cluster labels to the original dataframe
In [ ]:
# Step 9: Add Cluster Labels to the Dataset
# Add the cluster labels to the original dataset for further analysis
df['Cluster'] = kmeans.labels_
In [ ]:
# Convert df_scaled to numpy array for plotting
scaled_data = df_scaled.values
# Silhouette Scores for different k values
silhouette_scores = []
k_values = [1, 2, 3, 4, 5, 6]
plt.figure(figsize=(15, 8))
for idx, k in enumerate(k_values):
kmeans = KMeans(n_clusters=k, random_state=42)
cluster_labels = kmeans.fit_predict(df_scaled)
# Plot Silhouette Scores
if k > 1:
silhouette_avg = silhouette_score(df_scaled, cluster_labels)
silhouette_scores.append(silhouette_avg)
plt.subplot(2, 3, idx + 1)
plt.title(f'Silhouette Score for k={k}: {silhouette_avg:.2f}')
plt.scatter(scaled_data[:, 0], scaled_data[:, 1], c=cluster_labels, cmap='viridis', s=50)
else:
plt.subplot(2, 3, idx + 1)
plt.title(f'k={k}')
plt.scatter(scaled_data[:, 0], scaled_data[:, 1], cmap='viridis', s=50)
plt.tight_layout()
plt.show()
In [ ]:
# Step 11: Evaluate the Clustering Performance (Silhouette Score)
# Compute the silhouette score, which evaluates how well each point lies within its cluster
silhouette_avg = silhouette_score(df_scaled, kmeans.labels_)
print(f"\nSilhouette Score for K = {k_optimal}: {silhouette_avg:.2f}")
Silhouette Score for K = 6: 0.71
In [ ]:
# Step 11: Evaluate the Clustering Performance (Silhouette Score)
# Compute the silhouette score, which evaluates how well each point lies within its cluster
silhouette_avg = silhouette_score(df_scaled, kmeans.labels_)
print(f"\nSilhouette Score for K = {k_optimal}: {silhouette_avg:.2f}")
# Visualize the Silhouette Scores for Each Sample
from sklearn.metrics import silhouette_samples # Ensure silhouette_samples is imported
silhouette_values = silhouette_samples(df_scaled, kmeans.labels_)
plt.figure(figsize=(10, 6))
y_lower = 10
for i in range(k_optimal):
ith_cluster_silhouette_values = silhouette_values[kmeans.labels_ == i]
ith_cluster_silhouette_values.sort()
size_cluster_i = ith_cluster_silhouette_values.shape[0]
y_upper = y_lower + size_cluster_i
color = sns.color_palette('viridis', k_optimal)[i]
plt.fill_betweenx(np.arange(y_lower, y_upper), 0, ith_cluster_silhouette_values, facecolor=color, edgecolor=color, alpha=0.7)
plt.text(-0.05, y_lower + 0.5 * size_cluster_i, str(i))
y_lower = y_upper + 10 # 10 for the space between clusters
plt.xlabel("Silhouette Coefficient Values")
plt.ylabel("Cluster Label")
plt.title("Silhouette Plot for K-Means Clustering with K = " + str(k_optimal))
plt.axvline(x=silhouette_avg, color="red", linestyle="--")
plt.grid(True)
plt.show()
Silhouette Score for K = 6: 0.71
In [ ]:
# Step 12: Interpret the Results
# Analyze each cluster based on descriptive statistics and understand the customer segments
print("\nCluster Analysis:")
print(df.groupby('Cluster').agg('mean', numeric_only=True))
Cluster Analysis:
High School GPA Swimming Time 100m (sec) Swimming Time 200m (sec) \
Cluster
0 2.733404 54.431064 132.746915
1 2.633765 65.031412 139.420588
2 2.628167 62.093833 111.922833
3 3.129012 55.679630 114.462222
4 3.614388 60.245612 128.367449
5 2.964634 64.667683 118.059756
Swimming Time 400m (sec) Years Competitive Swimming Gender_Male \
Cluster
0 260.046277 6.500000 0.521277
1 326.796471 4.670588 0.576471
2 332.704167 2.716667 0.450000
3 343.921235 7.691358 0.419753
4 289.284592 2.755102 0.448980
5 258.495976 7.609756 0.548780
State_FL State_GA State_IL State_MI ... State_PA State_TX \
Cluster ...
0 0.031915 0.138298 0.106383 0.095745 ... 0.095745 0.117021
1 0.152941 0.070588 0.082353 0.129412 ... 0.105882 0.082353
2 0.150000 0.116667 0.083333 0.100000 ... 0.150000 0.033333
3 0.074074 0.123457 0.111111 0.160494 ... 0.086420 0.061728
4 0.071429 0.122449 0.163265 0.102041 ... 0.071429 0.091837
5 0.036585 0.048780 0.121951 0.097561 ... 0.109756 0.134146
Swimming Type_Butterfly Swimming Type_Freestyle \
Cluster
0 0.393617 0.329787
1 0.235294 0.317647
2 0.233333 0.383333
3 0.283951 0.308642
4 0.346939 0.316327
5 0.365854 0.365854
Distance Specialization_200m Distance Specialization_400m \
Cluster
0 0.457447 0.308511
1 0.223529 0.376471
2 0.283333 0.366667
3 0.308642 0.333333
4 0.357143 0.295918
5 0.341463 0.378049
Swim Club Membership_Yes KM DBSCAN Hierarchical
Cluster
0 0.404255 0.0 3.000000 1.010638
1 0.611765 1.0 2.000000 2.117647
2 0.533333 2.0 -0.016667 1.666667
3 0.419753 3.0 5.000000 1.802469
4 0.551020 4.0 1.000000 0.112245
5 0.512195 5.0 4.000000 0.560976
[6 rows x 23 columns]
In [ ]:
# Step 15: Visualize Cluster Counts
# Use a count plot to visualize the number of data points in each cluster
plt.figure(figsize=(10, 6))
sns.countplot(x='Cluster', data=df, palette='viridis')
plt.xlabel('Cluster')
plt.ylabel('Number of Data Points')
plt.title('Count of Data Points in Each Cluster')
plt.grid(True)
plt.show()
In [ ]:
# Step 16: Pairplot for Cluster Analysis
plt.figure(figsize=(20, 15))
sns.pairplot(df[['High School GPA', 'Swimming Time 100m (sec)',
'Swimming Time 200m (sec)', 'Swimming Time 400m (sec)',
'Years Competitive Swimming', 'Cluster']],
hue='Cluster',
palette='viridis',
plot_kws={'alpha': 0.6, 's': 80},
diag_kind='kde') # Use KDE plots on diagonal for better distribution visualization
plt.suptitle('Pairplot of student Performance Metrics by Cluster', y=1.02, fontsize=16)
plt.tight_layout()
plt.show()
<Figure size 2000x1500 with 0 Axes>
In [ ]:
# Convert numpy array back to DataFrame with original column names
df_scaled = pd.DataFrame(df_scaled, columns=numerical_columns.columns)
df_scaled['KM'] = kmeans_labels # Add cluster labels
# Create heatmap
kmeans_feature_means = df_scaled.groupby('KM').mean()
plt.figure(figsize=(10, 6))
sns.heatmap(kmeans_feature_means, annot=True, cmap='coolwarm')
plt.title('Heatmap of Feature Means by KMeans Cluster')
plt.show()
In [ ]:
feature_combinations = [
('High School GPA', 'Swimming Time 100m (sec)'),
('High School GPA', 'Swimming Time 200m (sec)'),
('High School GPA', 'Swimming Time 400m (sec)'),
('High School GPA', 'Years Competitive Swimming')
]
fig, axes = plt.subplots(2, 2, figsize=(18, 12))
axes = axes.flatten()
fig.suptitle('Students Performance Feature Combinations', fontsize=16)
for i, (x_feature, y_feature) in enumerate(feature_combinations):
ax = axes[i]
sns.scatterplot(x=x_feature, y=y_feature, hue='KM', palette='viridis', data=df, s=100, alpha=0.7, ax=ax)
ax.set_xlabel(x_feature)
ax.set_ylabel(y_feature)
ax.legend(title='Cluster')
ax.grid(True)
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()
In [ ]:
# Step 22.4: Hyperparameter Tuning of K-Means
# Use the silhouette score to guide hyperparameter tuning with different values of 'n_init' and 'max_iter'
kmeans_tuned = KMeans(n_clusters=k_optimal, n_init=20, max_iter=500, random_state=42)
kmeans_tuned.fit(df_scaled)
silhouette_avg_tuned = silhouette_score(df_scaled, kmeans_tuned.labels_)
print(f"Silhouette Score with Tuned K-Means for K = {k_optimal}: {silhouette_avg_tuned:.2f}")
Silhouette Score with Tuned K-Means for K = 6: 0.71
In [ ]:
# Step 22.5: Use PCA for Dimensionality Reduction before Clustering
# Reduce the dataset to fewer dimensions using PCA and then apply K-Means
from sklearn.decomposition import PCA # For dimensionality reduction and visualizing high-dimensional data # For evaluating clustering performance
pca = PCA(n_components=2)
X_pca = pca.fit_transform(df_scaled)
kmeans_pca = KMeans(n_clusters=k_optimal, random_state=42)
kmeans_pca.fit(X_pca)
silhouette_avg_pca = silhouette_score(X_pca, kmeans_pca.labels_)
print(f"Silhouette Score with PCA for K = {k_optimal}: {silhouette_avg_pca:.2f}")
Silhouette Score with PCA for K = 6: 0.93
In [ ]:
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt
# Step 1: Calculate WCSS and Silhouette Scores for PCA data
wcss = [] # Within-cluster sum of squares
silhouette_scores = [] # Silhouette scores for each K
# Test K values from 2 to 10
K_range = range(2, 11)
for k in K_range:
kmeans = KMeans(n_clusters=k, random_state=42) # Initialize KMeans with k clusters
kmeans.fit(df_scaled) # Fit KMeans to PCA-transformed data
wcss.append(kmeans.inertia_) # Append WCSS (inertia)
silhouette_scores.append(silhouette_score(df_scaled, kmeans.labels_)) # Append silhouette score
# Step 2: Plot the Elbow Method
plt.figure(figsize=(10, 6))
plt.plot(K_range, wcss, marker='o', linestyle='--', color='b', label='WCSS (Elbow)')
plt.xlabel('Number of Clusters (K)')
plt.ylabel('WCSS')
plt.title('Elbow Plot for PCA Data')
plt.legend()
plt.show()
# Step 3: Plot Silhouette Scores
plt.figure(figsize=(10, 6))
plt.plot(K_range, silhouette_scores, marker='o', linestyle='--', color='g', label='Silhouette Score')
plt.xlabel('Number of Clusters (K)')
plt.ylabel('Silhouette Score')
plt.title('Silhouette Scores for PCA Data')
plt.legend()
plt.show()
In [ ]:
# Step 1: Choose optimal K (e.g., from elbow or silhouette plot)
optimal_k = 6 # Replace with the chosen value of K
# Step 2: Fit KMeans on PCA-transformed data
kmeans = KMeans(n_clusters=optimal_k, random_state=42)
df_scaled['Cluster'] = kmeans.fit_predict(df_scaled) # Add cluster labels to the PCA DataFrame
# Step 3: Add cluster labels to original DataFrame for analysis
df_scaled['Cluster_KMeans'] = df_scaled['Cluster']
In [ ]:
df_scaled.sample(3)
Out[ ]:
| High School GPA | Swimming Time 100m (sec) | Swimming Time 200m (sec) | Swimming Time 400m (sec) | Years Competitive Swimming | KM | Cluster | Cluster_KMeans | |
|---|---|---|---|---|---|---|---|---|
| 473 | 0.150754 | 0.871407 | 0.540225 | 0.274729 | 0.777778 | 5 | 4 | 4 |
| 102 | 0.683417 | 0.806858 | 0.377313 | 0.271268 | 0.222222 | 4 | 1 | 1 |
| 424 | 0.341709 | 0.177005 | 0.037208 | 0.822281 | 0.333333 | 2 | 2 | 2 |
In [ ]:
# Step 1: Calculate mean of original features grouped by PCA-based clusters
cluster_means = df_scaled.groupby('Cluster_KMeans').mean()
# Step 2: Plot heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(cluster_means, annot=True, cmap='coolwarm', fmt='.2f') # Create heatmap with annotations
plt.title('Heatmap of Feature Means by Cluster (K-Means)')
plt.show()
In [ ]:
# Step 22.3: Use DBSCAN Clustering
# Use DBSCAN, which is a density-based clustering technique that can help improve clustering in non-spherical datasets.
from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=0.5, min_samples=5)
dbscan_labels = dbscan.fit_predict(df_scaled)
In [ ]:
# Evaluate DBSCAN Performance
silhouette_avg_dbscan = silhouette_score(df_scaled, dbscan_labels) if len(set(dbscan_labels)) > 1 else -1
print(f"Silhouette Score for DBSCAN: {silhouette_avg_dbscan:.2f}")
Silhouette Score for DBSCAN: 0.63
In [ ]:
# Fitting DBSCAN
dbscan = DBSCAN() # Initialize DBSCAN with default parameters
dbscan_labels = dbscan.fit_predict(df_scaled) # Fit DBSCAN and get cluster labels
df['DBSCAN'] = dbscan_labels # Add DBSCAN cluster labels to the original dataframe
In [ ]:
# Silhouette scores for different epsilon values
eps_values = [0.5] # List of epsilon values to evaluate
silhouette_dbscan = [] # List to store silhouette scores
for eps in eps_values:
dbscan = DBSCAN(eps=eps) # Initialize DBSCAN with a specific epsilon
labels = dbscan.fit_predict(df_scaled) # Fit DBSCAN and get labels
if len(set(labels)) > 1: # Check if more than one cluster is formed
silhouette_avg = silhouette_score(df_scaled, labels) # Calculate silhouette score
silhouette_dbscan.append((eps, silhouette_avg)) # Append the epsilon and silhouette score to the list
In [ ]:
# Printing silhouette scores
for eps, score in silhouette_dbscan:
print(f'Epsilon: {eps}, Silhouette Score: {score}')
Epsilon: 0.5, Silhouette Score: 0.6253645793716519
In [ ]:
# Plotting silhouette scores and clusters for DBSCAN
eps_values = [0.5]
plt.figure(figsize=(15, 15))
for idx, eps in enumerate(eps_values):
dbscan = DBSCAN(eps=eps)
labels = dbscan.fit_predict(df_scaled)
if len(set(labels)) > 1:
silhouette_avg = silhouette_score(df_scaled, labels)
plt.subplot(3, 2, idx + 1)
plt.title(f'Epsilon: {eps}, Silhouette Score: {silhouette_avg:.2f}')
# Convert df_scaled_dbscan to numpy array and select first two columns
data = df_scaled.to_numpy()
plt.scatter(data[:, 0], data[:, 1], c=labels, cmap='viridis', s=50)
plt.tight_layout()
plt.show()
In [ ]:
# Convert df_scaled_dbscan back to DataFrame with column names
df_scaled_dbscan = pd.DataFrame(df_scaled, columns=['High School GPA', 'Swimming Time 100m (sec)',
'Swimming Time 200m (sec)', 'Swimming Time 400m (sec)',
'Years Competitive Swimming'])
df_scaled_dbscan['DBSCAN'] = dbscan.labels_
# Create heatmap
dbscan_feature_means = df_scaled_dbscan.groupby('DBSCAN').mean()
plt.figure(figsize=(10, 6))
sns.heatmap(dbscan_feature_means, annot=True, cmap='coolwarm')
plt.title('Heatmap of Feature Means by DBSCAN Cluster')
plt.show()
In [ ]:
# Create boxplots for all numerical features
features_to_plot = ['Years Competitive Swimming', 'High School GPA',
'Swimming Time 100m (sec)', 'Swimming Time 200m (sec)',
'Swimming Time 400m (sec)']
fig, axes = plt.subplots(2, 4, figsize=(20, 12))
axes = axes.flatten()
for i, feature in enumerate(features_to_plot):
sns.boxplot(x='DBSCAN', y=feature, data=df, palette='viridis', ax=axes[i])
axes[i].set_title(f'Boxplot of {feature} by Cluster')
axes[i].set_xlabel('Cluster')
axes[i].set_ylabel(feature)
plt.tight_layout()
plt.show()
In [ ]:
# Cluster Density Plot with correct feature names
plt.figure(figsize=(10, 6))
sns.scatterplot(x='High School GPA', y='Swimming Time 100m (sec)',
hue='DBSCAN', data=df, palette='viridis')
plt.title('DBSCAN Cluster Density Plot')
plt.show()
In [ ]:
# Cluster Density Plot with correct feature names
plt.figure(figsize=(10, 6))
sns.scatterplot(x='High School GPA', y='Swimming Time 200m (sec)',
hue='DBSCAN', data=df, palette='viridis')
plt.title('DBSCAN Cluster Density Plot')
plt.show()
In [ ]:
# Cluster Density Plot with correct feature names
plt.figure(figsize=(10, 6))
sns.scatterplot(x='High School GPA', y='Swimming Time 400m (sec)',
hue='DBSCAN', data=df, palette='viridis')
plt.title('DBSCAN Cluster Density Plot')
plt.show()
In [ ]:
# Cluster Density Plot with correct feature names
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Years Competitive Swimming', y= 'High School GPA',
hue='DBSCAN', data=df, palette='viridis')
plt.title('DBSCAN Cluster Density Plot')
plt.show()
In [ ]:
# Scaling the data for hierarchical clustering
# Scaling the data for hierarchical clustering
scaler = MinMaxScaler()
df_scaled_hc = scaler.fit_transform(df[['High School GPA',
'Swimming Time 100m (sec)',
'Swimming Time 200m (sec)',
'Swimming Time 400m (sec)',
'Years Competitive Swimming']])
In [ ]:
# Plotting the dendrogram
plt.figure(figsize=(18, 7))
dendrogram = sch.dendrogram(sch.linkage(df_scaled_hc, method='ward')) # Create dendrogram using Ward's linkage
plt.title('Dendrogram for Hierarchical Clustering')
plt.xlabel('students')
plt.ylabel('Euclidean Distances')
plt.show()
In [ ]:
# Fitting the hierarchical clustering to the dataset
hc = AgglomerativeClustering(n_clusters=4, metric='euclidean', linkage='ward') # Initialize Agglomerative Clustering with 4 clusters
hc_labels = hc.fit_predict(df_scaled_hc) # Fit the model and get cluster labels
df['Hierarchical'] = hc_labels # Add the cluster labels to the original dataframe
In [ ]:
# Silhouette score for Hierarchical clustering
silhouette_hc = silhouette_score(df_scaled_hc, hc_labels) # Calculate silhouette score for hierarchical clustering
print(f'Silhouette Score for Hierarchical Clustering: {silhouette_hc}') # Print silhouette score
Silhouette Score for Hierarchical Clustering: 0.11617031200512948
In [ ]:
# Heatmap for Hierarchical clustering with improved visualization
hierarchical_feature_means = df.groupby('Hierarchical').mean()
plt.figure(figsize=(12, 8))
sns.heatmap(hierarchical_feature_means[['High School GPA',
'Swimming Time 100m (sec)',
'Swimming Time 200m (sec)',
'Swimming Time 400m (sec)',
'Years Competitive Swimming']],
annot=True,
cmap='coolwarm',
fmt='.2f',
cbar_kws={'label': 'Mean Value'},
yticklabels=['Cluster ' + str(x) for x in range(len(hierarchical_feature_means))])
plt.title('Feature Means by Hierarchical Cluster', pad=20, fontsize=14)
plt.ylabel('Clusters')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
In [ ]:
# Heatmap for Hierarchical clustering with all features
hierarchical_feature_means = df.groupby('Hierarchical').mean()
# Create heatmap with all features
plt.figure(figsize=(16, 10))
sns.heatmap(hierarchical_feature_means,
annot=True,
cmap='coolwarm',
fmt='.2f',
cbar_kws={'label': 'Mean Value'},
yticklabels=['Cluster ' + str(x) for x in range(len(hierarchical_feature_means))])
plt.title('Feature Means by Hierarchical Cluster', pad=20, fontsize=14)
plt.ylabel('Clusters')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
In [ ]:
# Create boxplots for selected features
features_to_plot = ['High School GPA', 'Swimming Time 100m (sec)', 'Swimming Time 200m (sec)',
'Swimming Time 400m (sec)', 'Years Competitive Swimming']
fig, axes = plt.subplots(1, 5, figsize=(20, 6))
for i, feature in enumerate(features_to_plot):
sns.boxplot(x='Hierarchical', y=feature, data=df, palette='viridis', ax=axes[i])
axes[i].set_title(f'Boxplot of {feature} by Cluster')
axes[i].set_xlabel('Cluster')
axes[i].set_ylabel(feature)
plt.tight_layout()
plt.show()
In [ ]:
!jupyter nbconvert --to html '/content/drive/My Drive/Assignment1/Assignment_1_Ndumnwere_Ezinne.ipynb'
[NbConvertApp] Converting notebook /content/drive/My Drive/Assignment1/Assignment_1_Ndumnwere_Ezinne.ipynb to html [NbConvertApp] Writing 1376712 bytes to /content/drive/My Drive/Assignment1/Assignment_1_Ndumnwere_Ezinne.html
In [ ]:
!apt-get install -y pandoc
Reading package lists... Done Building dependency tree... Done Reading state information... Done The following additional packages will be installed: libcmark-gfm-extensions0.29.0.gfm.3 libcmark-gfm0.29.0.gfm.3 pandoc-data Suggested packages: texlive-latex-recommended texlive-xetex texlive-luatex pandoc-citeproc texlive-latex-extra context wkhtmltopdf librsvg2-bin groff ghc nodejs php python ruby libjs-mathjax libjs-katex citation-style-language-styles The following NEW packages will be installed: libcmark-gfm-extensions0.29.0.gfm.3 libcmark-gfm0.29.0.gfm.3 pandoc pandoc-data 0 upgraded, 4 newly installed, 0 to remove and 49 not upgraded. Need to get 20.6 MB of archives. After this operation, 156 MB of additional disk space will be used. Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libcmark-gfm0.29.0.gfm.3 amd64 0.29.0.gfm.3-3 [115 kB] Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libcmark-gfm-extensions0.29.0.gfm.3 amd64 0.29.0.gfm.3-3 [25.1 kB] Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 pandoc-data all 2.9.2.1-3ubuntu2 [81.8 kB] Get:4 http://archive.ubuntu.com/ubuntu jammy/universe amd64 pandoc amd64 2.9.2.1-3ubuntu2 [20.3 MB] Fetched 20.6 MB in 2s (12.3 MB/s) Selecting previously unselected package libcmark-gfm0.29.0.gfm.3:amd64. (Reading database ... 123614 files and directories currently installed.) Preparing to unpack .../libcmark-gfm0.29.0.gfm.3_0.29.0.gfm.3-3_amd64.deb ... Unpacking libcmark-gfm0.29.0.gfm.3:amd64 (0.29.0.gfm.3-3) ... Selecting previously unselected package libcmark-gfm-extensions0.29.0.gfm.3:amd64. Preparing to unpack .../libcmark-gfm-extensions0.29.0.gfm.3_0.29.0.gfm.3-3_amd64.deb ... Unpacking libcmark-gfm-extensions0.29.0.gfm.3:amd64 (0.29.0.gfm.3-3) ... Selecting previously unselected package pandoc-data. Preparing to unpack .../pandoc-data_2.9.2.1-3ubuntu2_all.deb ... Unpacking pandoc-data (2.9.2.1-3ubuntu2) ... Selecting previously unselected package pandoc. Preparing to unpack .../pandoc_2.9.2.1-3ubuntu2_amd64.deb ... Unpacking pandoc (2.9.2.1-3ubuntu2) ... Setting up libcmark-gfm0.29.0.gfm.3:amd64 (0.29.0.gfm.3-3) ... Setting up libcmark-gfm-extensions0.29.0.gfm.3:amd64 (0.29.0.gfm.3-3) ... Setting up pandoc-data (2.9.2.1-3ubuntu2) ... Setting up pandoc (2.9.2.1-3ubuntu2) ... Processing triggers for man-db (2.10.2-1) ... Processing triggers for libc-bin (2.35-0ubuntu3.4) ... /sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libur_loader.so.0 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libur_adapter_opencl.so.0 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libur_adapter_level_zero.so.0 is not a symbolic link
In [ ]:
!apt-get install texlive-xetex texlive-fonts-recommended texlive-plain-generic
Reading package lists... Done Building dependency tree... Done Reading state information... Done The following additional packages will be installed: dvisvgm fonts-droid-fallback fonts-lato fonts-lmodern fonts-noto-mono fonts-texgyre fonts-urw-base35 libapache-pom-java libcommons-logging-java libcommons-parent-java libfontbox-java libfontenc1 libgs9 libgs9-common libidn12 libijs-0.35 libjbig2dec0 libkpathsea6 libpdfbox-java libptexenc1 libruby3.0 libsynctex2 libteckit0 libtexlua53 libtexluajit2 libwoff1 libzzip-0-13 lmodern poppler-data preview-latex-style rake ruby ruby-net-telnet ruby-rubygems ruby-webrick ruby-xmlrpc ruby3.0 rubygems-integration t1utils teckit tex-common tex-gyre texlive-base texlive-binaries texlive-latex-base texlive-latex-extra texlive-latex-recommended texlive-pictures tipa xfonts-encodings xfonts-utils Suggested packages: fonts-noto fonts-freefont-otf | fonts-freefont-ttf libavalon-framework-java libcommons-logging-java-doc libexcalibur-logkit-java liblog4j1.2-java poppler-utils ghostscript fonts-japanese-mincho | fonts-ipafont-mincho fonts-japanese-gothic | fonts-ipafont-gothic fonts-arphic-ukai fonts-arphic-uming fonts-nanum ri ruby-dev bundler debhelper gv | postscript-viewer perl-tk xpdf | pdf-viewer xzdec texlive-fonts-recommended-doc texlive-latex-base-doc python3-pygments icc-profiles libfile-which-perl libspreadsheet-parseexcel-perl texlive-latex-extra-doc texlive-latex-recommended-doc texlive-luatex texlive-pstricks dot2tex prerex texlive-pictures-doc vprerex default-jre-headless tipa-doc The following NEW packages will be installed: dvisvgm fonts-droid-fallback fonts-lato fonts-lmodern fonts-noto-mono fonts-texgyre fonts-urw-base35 libapache-pom-java libcommons-logging-java libcommons-parent-java libfontbox-java libfontenc1 libgs9 libgs9-common libidn12 libijs-0.35 libjbig2dec0 libkpathsea6 libpdfbox-java libptexenc1 libruby3.0 libsynctex2 libteckit0 libtexlua53 libtexluajit2 libwoff1 libzzip-0-13 lmodern poppler-data preview-latex-style rake ruby ruby-net-telnet ruby-rubygems ruby-webrick ruby-xmlrpc ruby3.0 rubygems-integration t1utils teckit tex-common tex-gyre texlive-base texlive-binaries texlive-fonts-recommended texlive-latex-base texlive-latex-extra texlive-latex-recommended texlive-pictures texlive-plain-generic texlive-xetex tipa xfonts-encodings xfonts-utils 0 upgraded, 54 newly installed, 0 to remove and 49 not upgraded. Need to get 182 MB of archives. After this operation, 571 MB of additional disk space will be used. Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 fonts-droid-fallback all 1:6.0.1r16-1.1build1 [1,805 kB] Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 fonts-lato all 2.0-2.1 [2,696 kB] Get:3 http://archive.ubuntu.com/ubuntu jammy/main amd64 poppler-data all 0.4.11-1 [2,171 kB] Get:4 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tex-common all 6.17 [33.7 kB] Get:5 http://archive.ubuntu.com/ubuntu jammy/main amd64 fonts-urw-base35 all 20200910-1 [6,367 kB] Get:6 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libgs9-common all 9.55.0~dfsg1-0ubuntu5.9 [752 kB] Get:7 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libidn12 amd64 1.38-4ubuntu1 [60.0 kB] Get:8 http://archive.ubuntu.com/ubuntu jammy/main amd64 libijs-0.35 amd64 0.35-15build2 [16.5 kB] Get:9 http://archive.ubuntu.com/ubuntu jammy/main amd64 libjbig2dec0 amd64 0.19-3build2 [64.7 kB] Get:10 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libgs9 amd64 9.55.0~dfsg1-0ubuntu5.9 [5,033 kB] Get:11 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libkpathsea6 amd64 2021.20210626.59705-1ubuntu0.2 [60.4 kB] Get:12 http://archive.ubuntu.com/ubuntu jammy/main amd64 libwoff1 amd64 1.0.2-1build4 [45.2 kB] Get:13 http://archive.ubuntu.com/ubuntu jammy/universe amd64 dvisvgm amd64 2.13.1-1 [1,221 kB] Get:14 http://archive.ubuntu.com/ubuntu jammy/universe amd64 fonts-lmodern all 2.004.5-6.1 [4,532 kB] Get:15 http://archive.ubuntu.com/ubuntu jammy/main amd64 fonts-noto-mono all 20201225-1build1 [397 kB] Get:16 http://archive.ubuntu.com/ubuntu jammy/universe amd64 fonts-texgyre all 20180621-3.1 [10.2 MB] Get:17 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libapache-pom-java all 18-1 [4,720 B] Get:18 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libcommons-parent-java all 43-1 [10.8 kB] Get:19 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libcommons-logging-java all 1.2-2 [60.3 kB] Get:20 http://archive.ubuntu.com/ubuntu jammy/main amd64 libfontenc1 amd64 1:1.1.4-1build3 [14.7 kB] Get:21 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libptexenc1 amd64 2021.20210626.59705-1ubuntu0.2 [39.1 kB] Get:22 http://archive.ubuntu.com/ubuntu jammy/main amd64 rubygems-integration all 1.18 [5,336 B] Get:23 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 ruby3.0 amd64 3.0.2-7ubuntu2.7 [50.1 kB] Get:24 http://archive.ubuntu.com/ubuntu jammy/main amd64 ruby-rubygems all 3.3.5-2 [228 kB] Get:25 http://archive.ubuntu.com/ubuntu jammy/main amd64 ruby amd64 1:3.0~exp1 [5,100 B] Get:26 http://archive.ubuntu.com/ubuntu jammy/main amd64 rake all 13.0.6-2 [61.7 kB] Get:27 http://archive.ubuntu.com/ubuntu jammy/main amd64 ruby-net-telnet all 0.1.1-2 [12.6 kB] Get:28 http://archive.ubuntu.com/ubuntu jammy/universe amd64 ruby-webrick all 1.7.0-3 [51.8 kB] Get:29 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 ruby-xmlrpc all 0.3.2-1ubuntu0.1 [24.9 kB] Get:30 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libruby3.0 amd64 3.0.2-7ubuntu2.7 [5,113 kB] Get:31 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libsynctex2 amd64 2021.20210626.59705-1ubuntu0.2 [55.6 kB] Get:32 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libteckit0 amd64 2.5.11+ds1-1 [421 kB] Get:33 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libtexlua53 amd64 2021.20210626.59705-1ubuntu0.2 [120 kB] Get:34 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libtexluajit2 amd64 2021.20210626.59705-1ubuntu0.2 [267 kB] Get:35 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libzzip-0-13 amd64 0.13.72+dfsg.1-1.1 [27.0 kB] Get:36 http://archive.ubuntu.com/ubuntu jammy/main amd64 xfonts-encodings all 1:1.0.5-0ubuntu2 [578 kB] Get:37 http://archive.ubuntu.com/ubuntu jammy/main amd64 xfonts-utils amd64 1:7.7+6build2 [94.6 kB] Get:38 http://archive.ubuntu.com/ubuntu jammy/universe amd64 lmodern all 2.004.5-6.1 [9,471 kB] Get:39 http://archive.ubuntu.com/ubuntu jammy/universe amd64 preview-latex-style all 12.2-1ubuntu1 [185 kB] Get:40 http://archive.ubuntu.com/ubuntu jammy/main amd64 t1utils amd64 1.41-4build2 [61.3 kB] Get:41 http://archive.ubuntu.com/ubuntu jammy/universe amd64 teckit amd64 2.5.11+ds1-1 [699 kB] Get:42 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tex-gyre all 20180621-3.1 [6,209 kB] Get:43 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 texlive-binaries amd64 2021.20210626.59705-1ubuntu0.2 [9,860 kB] Get:44 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-base all 2021.20220204-1 [21.0 MB] Get:45 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-fonts-recommended all 2021.20220204-1 [4,972 kB] Get:46 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-latex-base all 2021.20220204-1 [1,128 kB] Get:47 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libfontbox-java all 1:1.8.16-2 [207 kB] Get:48 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libpdfbox-java all 1:1.8.16-2 [5,199 kB] Get:49 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-latex-recommended all 2021.20220204-1 [14.4 MB] Get:50 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-pictures all 2021.20220204-1 [8,720 kB] Get:51 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-latex-extra all 2021.20220204-1 [13.9 MB] Get:52 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-plain-generic all 2021.20220204-1 [27.5 MB] Get:53 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tipa all 2:1.3-21 [2,967 kB] Get:54 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-xetex all 2021.20220204-1 [12.4 MB] Fetched 182 MB in 3s (52.0 MB/s) Extracting templates from packages: 100% Preconfiguring packages ... Selecting previously unselected package fonts-droid-fallback. (Reading database ... 123839 files and directories currently installed.) Preparing to unpack .../00-fonts-droid-fallback_1%3a6.0.1r16-1.1build1_all.deb ... Unpacking fonts-droid-fallback (1:6.0.1r16-1.1build1) ... Selecting previously unselected package fonts-lato. Preparing to unpack .../01-fonts-lato_2.0-2.1_all.deb ... Unpacking fonts-lato (2.0-2.1) ... Selecting previously unselected package poppler-data. Preparing to unpack .../02-poppler-data_0.4.11-1_all.deb ... Unpacking poppler-data (0.4.11-1) ... Selecting previously unselected package tex-common. Preparing to unpack .../03-tex-common_6.17_all.deb ... Unpacking tex-common (6.17) ... Selecting previously unselected package fonts-urw-base35. Preparing to unpack .../04-fonts-urw-base35_20200910-1_all.deb ... Unpacking fonts-urw-base35 (20200910-1) ... Selecting previously unselected package libgs9-common. Preparing to unpack .../05-libgs9-common_9.55.0~dfsg1-0ubuntu5.9_all.deb ... Unpacking libgs9-common (9.55.0~dfsg1-0ubuntu5.9) ... Selecting previously unselected package libidn12:amd64. Preparing to unpack .../06-libidn12_1.38-4ubuntu1_amd64.deb ... Unpacking libidn12:amd64 (1.38-4ubuntu1) ... Selecting previously unselected package libijs-0.35:amd64. Preparing to unpack .../07-libijs-0.35_0.35-15build2_amd64.deb ... Unpacking libijs-0.35:amd64 (0.35-15build2) ... Selecting previously unselected package libjbig2dec0:amd64. Preparing to unpack .../08-libjbig2dec0_0.19-3build2_amd64.deb ... Unpacking libjbig2dec0:amd64 (0.19-3build2) ... Selecting previously unselected package libgs9:amd64. Preparing to unpack .../09-libgs9_9.55.0~dfsg1-0ubuntu5.9_amd64.deb ... Unpacking libgs9:amd64 (9.55.0~dfsg1-0ubuntu5.9) ... Selecting previously unselected package libkpathsea6:amd64. Preparing to unpack .../10-libkpathsea6_2021.20210626.59705-1ubuntu0.2_amd64.deb ... Unpacking libkpathsea6:amd64 (2021.20210626.59705-1ubuntu0.2) ... Selecting previously unselected package libwoff1:amd64. Preparing to unpack .../11-libwoff1_1.0.2-1build4_amd64.deb ... Unpacking libwoff1:amd64 (1.0.2-1build4) ... Selecting previously unselected package dvisvgm. Preparing to unpack .../12-dvisvgm_2.13.1-1_amd64.deb ... Unpacking dvisvgm (2.13.1-1) ... Selecting previously unselected package fonts-lmodern. Preparing to unpack .../13-fonts-lmodern_2.004.5-6.1_all.deb ... Unpacking fonts-lmodern (2.004.5-6.1) ... Selecting previously unselected package fonts-noto-mono. Preparing to unpack .../14-fonts-noto-mono_20201225-1build1_all.deb ... Unpacking fonts-noto-mono (20201225-1build1) ... Selecting previously unselected package fonts-texgyre. Preparing to unpack .../15-fonts-texgyre_20180621-3.1_all.deb ... Unpacking fonts-texgyre (20180621-3.1) ... Selecting previously unselected package libapache-pom-java. Preparing to unpack .../16-libapache-pom-java_18-1_all.deb ... Unpacking libapache-pom-java (18-1) ... Selecting previously unselected package libcommons-parent-java. Preparing to unpack .../17-libcommons-parent-java_43-1_all.deb ... Unpacking libcommons-parent-java (43-1) ... Selecting previously unselected package libcommons-logging-java. Preparing to unpack .../18-libcommons-logging-java_1.2-2_all.deb ... Unpacking libcommons-logging-java (1.2-2) ... Selecting previously unselected package libfontenc1:amd64. Preparing to unpack .../19-libfontenc1_1%3a1.1.4-1build3_amd64.deb ... Unpacking libfontenc1:amd64 (1:1.1.4-1build3) ... Selecting previously unselected package libptexenc1:amd64. Preparing to unpack .../20-libptexenc1_2021.20210626.59705-1ubuntu0.2_amd64.deb ... Unpacking libptexenc1:amd64 (2021.20210626.59705-1ubuntu0.2) ... Selecting previously unselected package rubygems-integration. Preparing to unpack .../21-rubygems-integration_1.18_all.deb ... Unpacking rubygems-integration (1.18) ... Selecting previously unselected package ruby3.0. Preparing to unpack .../22-ruby3.0_3.0.2-7ubuntu2.7_amd64.deb ... Unpacking ruby3.0 (3.0.2-7ubuntu2.7) ... Selecting previously unselected package ruby-rubygems. Preparing to unpack .../23-ruby-rubygems_3.3.5-2_all.deb ... Unpacking ruby-rubygems (3.3.5-2) ... Selecting previously unselected package ruby. Preparing to unpack .../24-ruby_1%3a3.0~exp1_amd64.deb ... Unpacking ruby (1:3.0~exp1) ... Selecting previously unselected package rake. Preparing to unpack .../25-rake_13.0.6-2_all.deb ... Unpacking rake (13.0.6-2) ... Selecting previously unselected package ruby-net-telnet. Preparing to unpack .../26-ruby-net-telnet_0.1.1-2_all.deb ... Unpacking ruby-net-telnet (0.1.1-2) ... Selecting previously unselected package ruby-webrick. Preparing to unpack .../27-ruby-webrick_1.7.0-3_all.deb ... Unpacking ruby-webrick (1.7.0-3) ... Selecting previously unselected package ruby-xmlrpc. Preparing to unpack .../28-ruby-xmlrpc_0.3.2-1ubuntu0.1_all.deb ... Unpacking ruby-xmlrpc (0.3.2-1ubuntu0.1) ... Selecting previously unselected package libruby3.0:amd64. Preparing to unpack .../29-libruby3.0_3.0.2-7ubuntu2.7_amd64.deb ... Unpacking libruby3.0:amd64 (3.0.2-7ubuntu2.7) ... Selecting previously unselected package libsynctex2:amd64. Preparing to unpack .../30-libsynctex2_2021.20210626.59705-1ubuntu0.2_amd64.deb ... Unpacking libsynctex2:amd64 (2021.20210626.59705-1ubuntu0.2) ... Selecting previously unselected package libteckit0:amd64. Preparing to unpack .../31-libteckit0_2.5.11+ds1-1_amd64.deb ... Unpacking libteckit0:amd64 (2.5.11+ds1-1) ... Selecting previously unselected package libtexlua53:amd64. Preparing to unpack .../32-libtexlua53_2021.20210626.59705-1ubuntu0.2_amd64.deb ... Unpacking libtexlua53:amd64 (2021.20210626.59705-1ubuntu0.2) ... Selecting previously unselected package libtexluajit2:amd64. Preparing to unpack .../33-libtexluajit2_2021.20210626.59705-1ubuntu0.2_amd64.deb ... Unpacking libtexluajit2:amd64 (2021.20210626.59705-1ubuntu0.2) ... Selecting previously unselected package libzzip-0-13:amd64. Preparing to unpack .../34-libzzip-0-13_0.13.72+dfsg.1-1.1_amd64.deb ... Unpacking libzzip-0-13:amd64 (0.13.72+dfsg.1-1.1) ... Selecting previously unselected package xfonts-encodings. Preparing to unpack .../35-xfonts-encodings_1%3a1.0.5-0ubuntu2_all.deb ... Unpacking xfonts-encodings (1:1.0.5-0ubuntu2) ... Selecting previously unselected package xfonts-utils. Preparing to unpack .../36-xfonts-utils_1%3a7.7+6build2_amd64.deb ... Unpacking xfonts-utils (1:7.7+6build2) ... Selecting previously unselected package lmodern. Preparing to unpack .../37-lmodern_2.004.5-6.1_all.deb ... Unpacking lmodern (2.004.5-6.1) ... Selecting previously unselected package preview-latex-style. Preparing to unpack .../38-preview-latex-style_12.2-1ubuntu1_all.deb ... Unpacking preview-latex-style (12.2-1ubuntu1) ... Selecting previously unselected package t1utils. Preparing to unpack .../39-t1utils_1.41-4build2_amd64.deb ... Unpacking t1utils (1.41-4build2) ... Selecting previously unselected package teckit. Preparing to unpack .../40-teckit_2.5.11+ds1-1_amd64.deb ... Unpacking teckit (2.5.11+ds1-1) ... Selecting previously unselected package tex-gyre. Preparing to unpack .../41-tex-gyre_20180621-3.1_all.deb ... Unpacking tex-gyre (20180621-3.1) ... Selecting previously unselected package texlive-binaries. Preparing to unpack .../42-texlive-binaries_2021.20210626.59705-1ubuntu0.2_amd64.deb ... Unpacking texlive-binaries (2021.20210626.59705-1ubuntu0.2) ... Selecting previously unselected package texlive-base. Preparing to unpack .../43-texlive-base_2021.20220204-1_all.deb ... Unpacking texlive-base (2021.20220204-1) ... Selecting previously unselected package texlive-fonts-recommended. Preparing to unpack .../44-texlive-fonts-recommended_2021.20220204-1_all.deb ... Unpacking texlive-fonts-recommended (2021.20220204-1) ... Selecting previously unselected package texlive-latex-base. Preparing to unpack .../45-texlive-latex-base_2021.20220204-1_all.deb ... Unpacking texlive-latex-base (2021.20220204-1) ... Selecting previously unselected package libfontbox-java. Preparing to unpack .../46-libfontbox-java_1%3a1.8.16-2_all.deb ... Unpacking libfontbox-java (1:1.8.16-2) ... Selecting previously unselected package libpdfbox-java. Preparing to unpack .../47-libpdfbox-java_1%3a1.8.16-2_all.deb ... Unpacking libpdfbox-java (1:1.8.16-2) ... Selecting previously unselected package texlive-latex-recommended. Preparing to unpack .../48-texlive-latex-recommended_2021.20220204-1_all.deb ... Unpacking texlive-latex-recommended (2021.20220204-1) ... Selecting previously unselected package texlive-pictures. Preparing to unpack .../49-texlive-pictures_2021.20220204-1_all.deb ... Unpacking texlive-pictures (2021.20220204-1) ... Selecting previously unselected package texlive-latex-extra. Preparing to unpack .../50-texlive-latex-extra_2021.20220204-1_all.deb ... Unpacking texlive-latex-extra (2021.20220204-1) ... Selecting previously unselected package texlive-plain-generic. Preparing to unpack .../51-texlive-plain-generic_2021.20220204-1_all.deb ... Unpacking texlive-plain-generic (2021.20220204-1) ... Selecting previously unselected package tipa. Preparing to unpack .../52-tipa_2%3a1.3-21_all.deb ... Unpacking tipa (2:1.3-21) ... Selecting previously unselected package texlive-xetex. Preparing to unpack .../53-texlive-xetex_2021.20220204-1_all.deb ... Unpacking texlive-xetex (2021.20220204-1) ... Setting up fonts-lato (2.0-2.1) ... Setting up fonts-noto-mono (20201225-1build1) ... Setting up libwoff1:amd64 (1.0.2-1build4) ... Setting up libtexlua53:amd64 (2021.20210626.59705-1ubuntu0.2) ... Setting up libijs-0.35:amd64 (0.35-15build2) ... Setting up libtexluajit2:amd64 (2021.20210626.59705-1ubuntu0.2) ... Setting up libfontbox-java (1:1.8.16-2) ... Setting up rubygems-integration (1.18) ... Setting up libzzip-0-13:amd64 (0.13.72+dfsg.1-1.1) ... Setting up fonts-urw-base35 (20200910-1) ... Setting up poppler-data (0.4.11-1) ... Setting up tex-common (6.17) ... update-language: texlive-base not installed and configured, doing nothing! Setting up libfontenc1:amd64 (1:1.1.4-1build3) ... Setting up libjbig2dec0:amd64 (0.19-3build2) ... Setting up libteckit0:amd64 (2.5.11+ds1-1) ... Setting up libapache-pom-java (18-1) ... Setting up ruby-net-telnet (0.1.1-2) ... Setting up xfonts-encodings (1:1.0.5-0ubuntu2) ... Setting up t1utils (1.41-4build2) ... Setting up libidn12:amd64 (1.38-4ubuntu1) ... Setting up fonts-texgyre (20180621-3.1) ... Setting up libkpathsea6:amd64 (2021.20210626.59705-1ubuntu0.2) ... Setting up ruby-webrick (1.7.0-3) ... Setting up fonts-lmodern (2.004.5-6.1) ... Setting up fonts-droid-fallback (1:6.0.1r16-1.1build1) ... Setting up ruby-xmlrpc (0.3.2-1ubuntu0.1) ... Setting up libsynctex2:amd64 (2021.20210626.59705-1ubuntu0.2) ... Setting up libgs9-common (9.55.0~dfsg1-0ubuntu5.9) ... Setting up teckit (2.5.11+ds1-1) ... Setting up libpdfbox-java (1:1.8.16-2) ... Setting up libgs9:amd64 (9.55.0~dfsg1-0ubuntu5.9) ... Setting up preview-latex-style (12.2-1ubuntu1) ... Setting up libcommons-parent-java (43-1) ... Setting up dvisvgm (2.13.1-1) ... Setting up libcommons-logging-java (1.2-2) ... Setting up xfonts-utils (1:7.7+6build2) ... Setting up libptexenc1:amd64 (2021.20210626.59705-1ubuntu0.2) ... Setting up texlive-binaries (2021.20210626.59705-1ubuntu0.2) ... update-alternatives: using /usr/bin/xdvi-xaw to provide /usr/bin/xdvi.bin (xdvi.bin) in auto mode update-alternatives: using /usr/bin/bibtex.original to provide /usr/bin/bibtex (bibtex) in auto mode Setting up lmodern (2.004.5-6.1) ... Setting up texlive-base (2021.20220204-1) ... /usr/bin/ucfr /usr/bin/ucfr /usr/bin/ucfr /usr/bin/ucfr mktexlsr: Updating /var/lib/texmf/ls-R-TEXLIVEDIST... mktexlsr: Updating /var/lib/texmf/ls-R-TEXMFMAIN... mktexlsr: Updating /var/lib/texmf/ls-R... mktexlsr: Done. tl-paper: setting paper size for dvips to a4: /var/lib/texmf/dvips/config/config-paper.ps tl-paper: setting paper size for dvipdfmx to a4: /var/lib/texmf/dvipdfmx/dvipdfmx-paper.cfg tl-paper: setting paper size for xdvi to a4: /var/lib/texmf/xdvi/XDvi-paper tl-paper: setting paper size for pdftex to a4: /var/lib/texmf/tex/generic/tex-ini-files/pdftexconfig.tex Setting up tex-gyre (20180621-3.1) ... Setting up texlive-plain-generic (2021.20220204-1) ... Setting up texlive-latex-base (2021.20220204-1) ... Setting up texlive-latex-recommended (2021.20220204-1) ... Setting up texlive-pictures (2021.20220204-1) ... Setting up texlive-fonts-recommended (2021.20220204-1) ... Setting up tipa (2:1.3-21) ... Setting up texlive-latex-extra (2021.20220204-1) ... Setting up texlive-xetex (2021.20220204-1) ... Setting up rake (13.0.6-2) ... Setting up libruby3.0:amd64 (3.0.2-7ubuntu2.7) ... Setting up ruby3.0 (3.0.2-7ubuntu2.7) ... Setting up ruby (1:3.0~exp1) ... Setting up ruby-rubygems (3.3.5-2) ... Processing triggers for man-db (2.10.2-1) ... Processing triggers for fontconfig (2.13.1-4.2ubuntu5) ... Processing triggers for libc-bin (2.35-0ubuntu3.4) ... /sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libur_loader.so.0 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libur_adapter_opencl.so.0 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link /sbin/ldconfig.real: /usr/local/lib/libur_adapter_level_zero.so.0 is not a symbolic link Processing triggers for tex-common (6.17) ... Running updmap-sys. This may take some time... done. Running mktexlsr /var/lib/texmf ... done. Building format(s) --all. This may take some time... done.
In [ ]:
!jupyter nbconvert --to pdf '/content/drive/My Drive/Assignment1/Assignment_1_Ndumnwere_Ezinne.ipynb'
[NbConvertApp] Converting notebook /content/drive/My Drive/Assignment1/Assignment_1_Ndumnwere_Ezinne.ipynb to pdf [NbConvertApp] Support files will be in Assignment_1_Ndumnwere_Ezinne_files/ [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files [NbConvertApp] Writing 87292 bytes to notebook.tex [NbConvertApp] Building PDF [NbConvertApp] Running xelatex 3 times: ['xelatex', 'notebook.tex', '-quiet'] [NbConvertApp] Running bibtex 1 time: ['bibtex', 'notebook'] [NbConvertApp] WARNING | bibtex had problems, most likely because there were no citations [NbConvertApp] PDF successfully created [NbConvertApp] Writing 552669 bytes to /content/drive/My Drive/Assignment1/Assignment_1_Ndumnwere_Ezinne.pdf